Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences.
نویسندگان
چکیده
A common assumption in comparative sequence analysis is that the sequences have evolved with the same pattern of nucleotide substitution (homogeneity of the evolutionary process). Violation of this assumption is known to adversely impact the accuracy of phylogenetic inference and tests of evolutionary hypotheses. Here we propose a disparity index, ID, which measures the observed difference in evolutionary patterns for a pair of sequences. On the basis of this index, we have developed a Monte Carlo procedure to test the homogeneity of the observed patterns. This test does not require a priori knowledge of the pattern of substitutions, extent of rate heterogeneity among sites, or the evolutionary relationship among sequences. Computer simulations show that the ID-test is more powerful than the commonly used chi2-test under a variety of biologically realistic models of sequence evolution. An application of this test in an analysis of 3789 pairs of orthologous human and mouse protein-coding genes reveals that the observed evolutionary patterns in neutral sites are not homogeneous in 41% of the genes, apparently due to shifts in G + C content. Thus, the proposed test can be used as a diagnostic tool to identify genes and lineages that have evolved with substantially different evolutionary processes as reflected in the observed patterns of change. Identification of such genes and lineages is an important early step in comparative genomics and molecular phylogenetic studies to discover evolutionary processes that have shaped organismal genomes.
منابع مشابه
Evolutionary distance estimation under heterogeneous substitution pattern among lineages.
Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of d...
متن کاملComputation of the Sadhana (Sd) Index of Linear Phenylenes and Corresponding Hexagonal Sequences
The Sadhana index (Sd) is a newly introduced cyclic index. Efficient formulae for calculating the Sd (Sadhana) index of linear phenylenes are given and a simple relation is established between the Sd index of phenylenes and of the corresponding hexagonal sequences.
متن کاملCodon bias patterns in photosynthetic genes of halophytic grass Aeluropus littoralis
Codon bias refers to the differences in the frequency of occurrence of synonymous codons in coding DNA. Pattern of codon and optimum codon utilization is significantly different between the lives. This difference is due to the long term function of natural selection and evolution process. Genetics drift, mutation and regulation of gene expression are the main reasons for codon bias. In this stu...
متن کامل(مقاله کوتاه) تجزیه فیلوژنی و تکامل مولکولی لپتین
In the current study, phylogenetic analysis and molecular evolution of the mammalian’s Leptin was investigated. Data was achieved and aligned by searching its genome database, while all examined mammals contained only a single copy of the Leptin. The nucleotide substitution rate of the sequences and molecular evolution of the Leptin were calculated by maximum likelihood and neighbor-joinin...
متن کاملGenome-wide analysis of primate and rodent protein-coding and associated non-coding nucleotide sequences
Materials and methods We obtained gene data (coding sequences, 5’ and 3’ UTRs, intron sequences, and 5,000 bases of the 5’ and 3’ flanking regions) from Ensembl [http://www.ensembl.org] after determining the Ensembl IDs from the online database InParanoid7 [http://inparanoid.sbc.su.se] for all known orthologs among four mammalian species (two primate and two rodent): human (Homo sapiens), chimp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetics
دوره 158 3 شماره
صفحات -
تاریخ انتشار 2001